Zipf's Law and the Frequency of Characters or Words of Oracles
نویسنده
چکیده
The article discusses the frequency of characters of Oracle,concluding that the frequency and the rank of a word or character is fit to Zipf-Mandelboit Law or Zipf’s law with three parameters,and figuring out the parameters based on the frequency,and pointing out that what some researchers of Oracle call the assembling on the two ends is just a description by their impression about the Oracle data.
منابع مشابه
Extension of Zipf's Law to Word and Character N-grams for English and Chinese
It is shown that for a large corpus, Zipf 's law for both words in English and characters in Chinese does not hold for all ranks. The frequency falls below the frequency predicted by Zipf's law for English words for rank greater than about 5,000 and for Chinese characters for rank greater than about 1,000. However, when single words or characters are combined together with n-gram words or chara...
متن کاملRank-frequency relation for Chinese characters
We show that the Zipf's law for Chinese characters perfectly holds for sufficiently short texts (few thousand different characters). The scenario of its validity is similar to the Zipf's law for words in short English texts. For long Chinese texts (or for mixtures of short Chinese texts), rank-frequency relations for Chinese characters display a two-layer, hierarchic structure that combines a Z...
متن کاملZipf's Law and Statistical Data on Modern Tibetan
In this paper, a large scale modern Tibetan text corpus is built, which includes about 190 thousands documents, 67.21 million words, 93.66 million syllables in total. Based on the corpus, statistics are made in several language units in different granularities. Statistical data show that : a syllable has 3.26 letters or 2.20 super characters in average, while a sentence has 75.40 letters or 63....
متن کاملDeviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes
Zipf's law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experiments show that: (i) The character frequency distribution follows a power law with exponent close to one, a...
متن کاملMaximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings
The word-frequency distribution of a text written by an author is well accounted for by a maximum entropy distribution, the RGF (random group formation)-prediction. The RGF-distribution is completely determined by the a priori values of the total number of words in the text (M), the number of distinct words (N) and the number of repetitions of the most common word (k(max)). It is here shown tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1412.2821 شماره
صفحات -
تاریخ انتشار 2014